April 2, 2018

About the analysis

Prepared for KPMG R Data Challenge.

The analysis focuses on the key places of interest during the day: which places are alive earlier and which ones later in the evening


Which data were used?

  • Instagram posts received via the Instagram API
  • Location: Prague - Karlin (and surrounding areas)
  • Period: from Jan 01, 2016 to Mar 30, 2016
  • Number of rows after data cleaning: 9290

What are the times when Karlin lives on Instagram

What are the key spots?

Top25 hashtags used


hashtag minimum frequency: 162

Hashtags are used at specific spots

Clustering

The corespondence analysis results indicate three clusters:
1. party,
2. food,
3. beauty.

The K-means clustering is performed here (using the factoextra library). The locations with spots having at least 20 are selected.

The following variables are used:
- hashtag profile of each spot
- peak time data on free day and at the weekend
- fraction of free day and working day posts
- number of likes per post

Clustering tree

Clustering results

  • We can see cca 4 groups in the dendrogram
  • We cut the dendrogram on the height = 40 and 50 and recode the group on the right together
Hierarchical clustering results
Cluster Peak time % free day Likes/post Top5 hashtags
Cluster 1 17:39 38.8% 63.3 #city #street #trip #view #vitkov
Cluster 2 17:14 30.1% 137.3 #fitness #haircut #lifestyle #motivation #workout
Cluster 3 16:50 21.3% 32.8 #dance #girls #poledance #repost #studioallurepraha
Cluster 4 16:47 33.2% 52.4 #delicious #food #foodporn #instafood #lunch

Clusters on the map

Thank you!